A Semi-Supervised Learning Approach for Acoustic-Prosodic Personality Perception in Under-Resourced Domains

نویسندگان

  • Rubén Solera-Ureña
  • Helena Moniz
  • Fernando Batista
  • Vera Cabarrão
  • Anna Pompili
  • Ramón Fernández Astudillo
  • Joana Campos
  • Ana Paiva
  • Isabel Trancoso
چکیده

Automatic personality analysis has gained attention in the last years as a fundamental dimension in human-to-human and human-to-machine interaction. However, it still suffers from limited number and size of speech corpora for specific domains, such as the assessment of children’s personality. This paper investigates a semi-supervised training approach to tackle this scenario. We devise an experimental setup with age and language mismatch and two training sets: a small labeled training set from the Interspeech 2012 Personality Sub-challenge, containing French adult speech labeled with personality OCEAN traits, and a large unlabeled training set of Portuguese children’s speech. As test set, a corpus of Portuguese children’s speech labeled with OCEAN traits is used. Based on this setting, we investigate a weak supervision approach that iteratively refines an initial model trained with the labeled data-set using the unlabeled data-set. We also investigate knowledge-based features, which leverage expert knowledge in acoustic-prosodic cues and thus need no extra data. Results show that, despite the large mismatch imposed by language and age differences, it is possible to attain improvements with these techniques, pointing both to the benefits of using a weak supervision and expert-based acoustic-prosodic features across age and language.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessing context and learning for isizulu tone recognition

Prosody plays an integral role in spoken language understanding. In isiZulu, a Nguni family language with lexical tone, prosodic information determines word meaning. We assess the impact of models of tone and coarticulation for tone recognition. We demonstrate the importance of modeling prosodic context to improve tone recognition. We employ this less commonly studied language to assess models ...

متن کامل

Automatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Networks

Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation...

متن کامل

Sentiment Classification in Under-Resourced Languages Using Graph-Based Semi-Supervised Learning Methods

In sentiment classification, conventional supervised approaches heavily rely on a large amount of linguistic resources, which are costly to obtain for under-resourced languages. To overcome this scarce resource problem, there exist several methods that exploit graph-based semisupervised learning (SSL). However, fundamental issues such as controlling label propagation, choosing the initial seeds...

متن کامل

Semi-supervised extractive speech summarization via co-training algorithm

Supervised methods for extractive speech summarization require a large training set. Summary annotation is often expensive and time consuming. In this paper, we exploit semi-supervised approaches to leverage unlabeled data. In particular, we investigate co-training for the task of extractive meeting summarization. Compared with text summarization, speech summarization task has its unique charac...

متن کامل

Big Five vs. Prosodic Features as Cues to Detect Abnormality in SSPNET-Personality Corpus

This paper presents an attempt to evaluate three different sets of features extracted from prosodic descriptors and Big Five traits for building an anomaly detector. The Big Five model enables to capture personality information. Big Five traits are extracted from a manual annotation while Prosodic features are extracted directly from the speech signal. Two different anomaly detection methods ar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017